The interest of this report is the characterizing of neural cells during the developing
human cortex using the provided Seurat object.
To get insight from your data , I can propose the following analyses:
In this analysis, we will perform analysis of single cell data from
human cortex cells, we have 5000 cells from 3 batch
I will perfrom seurat worflow analysis after checking if there is any batch effect to correct
As we can see in the Figure 1 (violin plots), the
data provided has a good quality (0 % of mitochondrial
genes, the minimal number of genes detected in each cell is
501 and the minimal number of molecules detected within
a cell is 642)
NB: This data seems to be already filtered so no cut
needed.
After clustering using Seurat to identify (using public and available
genes marker genes) all cells population present in your dataset. This
will allow us to explore the heterogeneity of neural subtypes during
this developmental window (pcw16, pcw20, pcw21 and pcw24).
The potential limitation here is that the provided dataset may
not capture the full diversity of cell subtypes, some cells are less
abundant. Also, the cluster assignments could be struggling because some
gene markers are not robust .
But : when I checked the metadata slot in the provided Seurat object,
I saw that the pre-analysis (including clustering and cell populations
definition) are already done as the Figure 2 show we
can note 23 cell types identified (with one cluster of aknown
cells).
No batch effect was detected in the Figure 3 population
providing from several batch are clustering together, that why I will
skip the correction step.
After quality control and cell types identification, we can perform a
differential gene expression analysis (DEG) to identify genes that are
specifically upregulated in each subtype.
This will provide insights into the molecular characteristics of these
subtypes and potentially reveal functional differences.
Differential gene expression analysis is performed to identify genes
specifically upregulated in each subtype.
Since the dataset is a subset, which might limit the depth of analysis,
I will normalize and scale data before the maker identification
step.
The heatmap shows the expression of top 30 markers for each cell
population found. We can draw several pieces of information, like for
example : the Outer radial glia 1 and 2 seems express the same blocs of
genes with a one bloque (1 in the top) overexpressed in the Outer radial
glia 2 population (same in Interneurons 1 and 2) …
NB: I left the “unknown” population ,by purpose, to try to extract information (manually from literature) after extracting the list of marker genes.
## Centering and scaling data matrix
List of top 30 gene markers of “Unknown” cells : FOXD2, NGF, TNNT2,
FMOD, OSR1, SIX2, NA, EFEMP1, ACTG2, RAB17, TWIST2, UGT3A2, C7, TFAP2B,
SCARA5, OGN, OMD, PRRX2, SLC22A6, MRGPRF, PGR, ITIH2, GJB2, DLK1,
ALDH1A2, ISLR, ALDH1A3, C16orf89, SERPIND1, PVALB`
for example : The specific function of this “FOXD2” has not yet been
determined, NGF: nerve growth factor, troponin T2, cardiac type …
As cells move between states, they undergo a process of
transcriptional re-configuration, with some genes being silenced and
others newly activated producing a dynamic repetoire of proteins and
metabolites that carry out their work. To explore the developmental
trajectories of identified cell subtypes in the human cortex, I will
perform a trajectory analysis. This could reveal the differentiation
paths and how these subtypes emerge during this developmental window
(from pcw16 to pcw24).
the problem here is that the trajectory analysis assumes a linear
progression, which might not always reflect the real physiological
cellular differentiation (complex process).
To do that, I will use \(monocle3\) which is an algorithm to learn the sequence of gene expression changes each cell must go through as part of a dynamic biological process. Once it has learned the overall “trajectory” of gene expression changes, Monocle can place each cell at its proper position in the trajectory. The workflow used from Link monocle’s guide
Figure 6: Umap of cells colored by the pseudotime
As the Figure 6 show, the cells are plotted over a pseudotime going from
zeo (blue) to 15 (yellow). Its is supposed that the cells in dark blue
are the first cell to appear during differentiation (if we see the
Figure 7/8 its : Outer radial glia and Microglia from pcw16 and pcw20)
than in intermediate stage in red is Migrating glutamatergic neurons
from pcw20 and pcw21 and the final stage will be in red with
Interneurons from pcw24.
## Aligning cells from different batches using Batchelor.
## Please remember to cite:
## Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018). 'Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.' Nat. Biotechnol., 36(5), 421-427. doi: 10.1038/nbt.4091
## No preprocess_method specified, and aligned coordinates have been computed previously. Using preprocess_method = 'Aligned'
## | | | 0% | |======================================================================| 100%
## | | | 0% | |======================================================================| 100%
After all these analyses, we had a global idea about the populations and
their dispersion through time, the genes on and off during each
stage.
But, we could also perform some supplemental analysis like GO and
pathway enrichment analysis on the differentially expressed genes to
gain insights into the biological processes and pathways that are
enriched in each cell subtype and each stage.
The integration of other data type eg. single-cell ChIP-seq to explore
the chromatin dynamics during cell development and to get a broader
perspective.
The integration of the spatial transcriptomic data can produce a
high-resolution maps of cellular sub-populations in the human
cortex.